Goto

Collaborating Authors

 bouba-kiki effect


Cross-modal Associations in Vision and Language Models: Revisiting the Bouba-Kiki Effect

Kouwenhoven, Tom, Shahrasbi, Kiana, Verhoef, Tessa

arXiv.org Artificial Intelligence

Recent advances in multimodal models have raised questions about whether vision-and-language models (VLMs) integrate cross-modal information in ways that reflect human cognition. One well-studied test case in this domain is the bouba-kiki effect, where humans reliably associate pseudowords like `bouba' with round shapes and `kiki' with jagged ones. Given the mixed evidence found in prior studies for this effect in VLMs, we present a comprehensive re-evaluation focused on two variants of CLIP, ResNet and Vision Transformer (ViT), given their centrality in many state-of-the-art VLMs. We apply two complementary methods closely modelled after human experiments: a prompt-based evaluation that uses probabilities as a measure of model preference, and we use Grad-CAM as a novel approach to interpret visual attention in shape-word matching tasks. Our findings show that these model variants do not consistently exhibit the bouba-kiki effect. While ResNet shows a preference for round shapes, overall performance across both model variants lacks the expected associations. Moreover, direct comparison with prior human data on the same task shows that the models' responses fall markedly short of the robust, modality-integrated behaviour characteristic of human cognition. These results contribute to the ongoing debate about the extent to which VLMs truly understand cross-modal concepts, highlighting limitations in their internal representations and alignment with human intuitions.


What does Kiki look like? Cross-modal associations between speech sounds and visual shapes in vision-and-language models

Verhoef, Tessa, Shahrasbi, Kiana, Kouwenhoven, Tom

arXiv.org Artificial Intelligence

Humans have clear cross-modal preferences when matching certain novel words to visual shapes. Evidence suggests that these preferences play a prominent role in our linguistic processing, language learning, and the origins of signal-meaning mappings. With the rise of multimodal models in AI, such as vision- and-language (VLM) models, it becomes increasingly important to uncover the kinds of visio-linguistic associations these models encode and whether they align with human representations. Informed by experiments with humans, we probe and compare four VLMs for a well-known human cross-modal preference, the bouba-kiki effect. We do not find conclusive evidence for this effect but suggest that results may depend on features of the models, such as architecture design, model size, and training details. Our findings inform discussions on the origins of the bouba-kiki effect in human cognition and future developments of VLMs that align well with human cross-modal associations.


Language: People across cultures agree the word 'bouba' sounds round while 'kiki' sounds pointy

Daily Mail - Science & tech

From English to Chinese, Hungarian and Zulu, people who speak different languages make the same links between sounds and shapes, a new study shows. An international research team has conducted the largest cross-cultural test of the'bouba-kiki effect' – the tendency to associate made-up words'bouba' with a round shape and'kiki' with a spiky shape. The researchers surveyed 917 speakers of 25 different languages representing nine language families and 10 writing systems. They found the effect exists independently of the language that a person speaks or the writing system that they use, whether it's the Roman alphabet (A, B, C), the Greek alphabet (alpha, beta, gamma) or Chinese characters (北, 方, 话). Such universally-meaningful vocalisations may form a global basis for the creation of new words, such as terms that circulate on social media. Bouba and kiki shapes used in the experiment.